RNA-seq analysis identifies transcriptomic profiles associated with anal cancer recurrence among people living with HIV

Abstract Background Chemoradiation therapy (CRT) is the standard of care for squamous cell carcinoma of the anus (SCCA), the most common type of anal cancer. However, approximately one fourth of patients still relapse after CRT. Methods We used RNA-sequencing technology to characterize coding and non-coding transcripts in tumor tissues from CRT-treated SCCA patients and compare them between 9 non-recurrent and 3 recurrent cases. RNA was extracted from FFPE tissues. Library preparations for RNA-sequencing were created using SMARTer Stranded Total RNA-Seq Kit. All libraries were pooled and sequenced on a NovaSeq 6000. Function and pathway enrichment analysis was performed with Metascape and enrichment of gene ontology (GO) was performed with Gene Set Enrichment Analysis (GSEA). Results There were 449 differentially expressed genes (DEGs) observed (390 mRNA, 12 miRNA, 17 lincRNA and 18 snRNA) between the two groups. We identified a core of upregulated genes (IL4, CD40LG, ICAM2, HLA-I (HLA-A, HLA-C) and HLA-II (HLA-DQA1, HLA-DRB5) in the non-recurrent SCCA tissue enriching to the gene ontology term ‘allograft rejection’, which suggests a CD4+ T cell driven immune response. Conversely, in the recurrent tissues, keratin (KRT1, 10, 12, 20) and hedgehog signaling pathway (PTCH2) genes involved in ‘Epidermis Development,’, were significantly upregulated. We identified miR-4316, that inhibit tumor proliferation and migration by repressing vascular endothelial growth factors, as being upregulated in non-recurrent SCCA. On the contrary, lncRNA-SOX21-AS1, implicated in the progression of many other cancers, was also found to be more common in our recurrent compared to non-recurrent SCCA. 
Conclusions
 Our study identified key host factors which may drive the recurrence of SCCA and warrants further studies to understand the mechanism and evaluate their potential use in personalized treatment. Key Message Our study used RNA sequencing (RNA-seq) to identify pivotal factors in coding and non-coding transcripts which differentiate between patients at risk for recurrent anal cancer after treatment. There were 449 differentially expressed genes (390 mRNA, 12 miRNA, 17 lincRNA and 18 snRNA) between 9 non-recurrent and 3 recurrent squamous cell carcinoma of anus (SCCA) tissues. The enrichment of genes related to allograft rejection was observed in the non-recurrent SCCA tissues, while the enrichment of genes related to epidermis development was positively linked with recurrent SCCA tissues.


Introduction
anal cancer comprises 1.5% of gastrointestinal malignancies [1]. although multiple tumor types can arise in the anal canal, squamous cell carcinoma of the anus (scca) is the most common histologic diagnosis (85%) and is often referred to as 'anal cancer (ac)' . it mainly occurs in older adults in the general population, with an average age at diagnosis in the early 60s [2]. in 2019, there were an estimated 74,752 people living with anal cancer in the Us. in 2022, an estimated 9400 new cases will be diagnosed, along with approximately 1670 deaths nationwide [3]. according to the most recent surveillance report by the National cancer institute, the overall incidence increased by 4.6% per year from 2001 to 2009, with no significant reduction from 2009 to 2015 [4]. the risk of scca is 24-fold higher in women living with hiV and >50-fold higher in men living with hiV (MlWh) compared to the general population [5][6][7][8]. the incidence rates of scca are 129-fold higher in men who have sex with men (MsM) with hiV, compared to the general Us population [9]. it is the second most common non-aiDs-defining malignancy and a leading cause of morbidity among people living with hiV (PlWh) in the Us. chemoradiation therapy (cRt), the standard treatment for scca [10][11][12], has shown a superior treatment rate compared to other treatments. however, 10-26% of patients do not have a complete response to cRt, and about 24% of patients may relapse [13][14][15][16]. Further, treatment outcomes for PlWh are not as well described as for individuals living without hiV. Given the limited number of scca cases, a conventional clinical trial is limited and molecular approaches have been explored. Yanik et al. [17] indicated that there were no differences with immune-related expression profiles or check-point molecule expression (e.g. PD-l1) in scca by hiV status. however, Yanik's study combined tumor tissues from treated and untreated individuals and importantly only select candidate immune-related genes were examined in 3 PlWh and 4 patients without hiV. a multi-platform profiling analysis of scca tissues by smaglo et al. suggested over-expression of proteins involved in resistance to chemotherapeutic drugs such as excision repair cross-complementing gene 1 (ERCC1) and multi-drug resistance-associated protein 1 (MRP1) that offered therapeutic investigation [18]. Other genomic profiling studies suggested no unique patterns between primary scca and recurrent scca [19]; mutations in PIK3CA, FBXW7, MLL2, and MLL3 appeared to be common among primary and metastatic scca [20][21][22][23]. however, to date there are no biological signatures that determine cRt response in scca patients.
identifying predictive biomarkers regarding therapeutic impacts and disease prognosis using RNa-sequencing (RNa-seq) based approaches has been widely applied in the field of cancer research [24]. While the use of this approach revealed significant coding and non-coding transcripts in various cancers [25], transcriptomic signatures in scca are lacking, specifically with treatment response outcomes. in this study, we characterized gene expression profiling in scca tumor tissues from PlWh to compare coding (mRNa) and non-coding (e.g. miRNa, incRNa) transcripts that were differentially expressed among successful cRt versus recurrent scca patients. the transcriptomic biomarkers identified in this study can provide insights into the pathological processes driving observed treatment outcomes and need to be validated in larger cohorts in future studies to determine whether they can be accurately used to screen for early diagnosis of patients at high risk of having recurrent anal cancer.

Cohort and study design
electronic health records (ehR) of PlWh attending the University of alabama at Birmingham (UaB) hiV clinic (the 1917 clinic) between 1 January 2006 and 31 March 2018 were reviewed. each patient was retrospectively reviewed 5 years after the cancer diagnosis or until the end of the most recent clinical visit. in accordance with standard procedures at the UaB comprehensive cancer center (UaB-ccc), all scca (icD-O-3 site code c20.9, c21.0-c21.9) are recorded in the ehR. these were reviewed by an oncologist and the linked cancer tissues in the biorepositories were microscopically confirmed and validated by a pathologist. Non-recurrence is defined as the absence of disease at the site of the primary tumor and regional lymph nodes within 6 months from the end of chemoradiation therapy. local recurrence (lR) is defined as persistent disease or recurrence at the site of the primary tumor. each patient's ehR was retrospectively reviewed up to 5 years after the last session of chemoradiation therapy to determine if lR occurred. tumor tissues linked with the confirmed scca diagnoses were requested from the UaB tissue Biorepository.

RNA extraction
RNa was extracted from unstained formalin-fixed, paraffin-embedded (FFPe) slides. We obtained five 5 mm-thick unstained FFPe slides of each patient (9 cR and 3 lR). the tissues from five FFPe sections were combined and used for the simultaneous isolation of total RNa and DNa with Quick-DNa/RNa FFPe kit (R1009) from Zymo Research, costa Mesa, ca according to the manufacturer's protocol. RNa was treated with DNase1 in the columns. the concentration and purity of RNa were assessed using Nanodrop spectrophotometer (thermo scientific, Waltham, Ma). additional Qa/Qc of the samples was conducted in the Genomics core laboratory at the University of Minnesota using agilent Bioanalyzer/tapestation analysis. assays were performed with 20 ul volume, with a minimal of total 400 ng of RNa.

Library preparation and RNA-sequencing
libraries were created using sMaRter stranded total RNa-seq Kit v2 -Pico input Mammalian (takara Bio Usa, inc, san Jose, ca), as described in the manufacturer's technical note (https://www.takarabio.com/Pdf/ RenderPdf ). all libraries were pooled and sequenced on a Novaseq 6000 (illumina, san Diego, ca). Depths of >40 million paired-end 150 bp reads were generated for each sample and all expected barcodes were detected. the overall sequencing quality among all libraries was consistently good, with an overall average mean quality score of >Q30.

Bioinformatics analysis
Raw FastQ sequences were first trimmed to remove adapter sequences to a minimum sequence length of 75 bp with a mean Phred score cutoff of 30 to ensure only high-quality reads were used for alignment and counting using trim Galore as previously described [26,27]. input RNa-seq reads were aligned to the GRch38 human reference genome using hisat2 software [28][29][30][31]. transcript abundance was measured using featurecounts and the matrix of counts served as the unit for the differentially expressed gene (DeG), which was conducted using the bioconductor R package, edgeR [32]. Utilizing the 'summarizeOverlaps (mode = Union)' function, only those reads that overlapped with at least one feature, as well as overlapped between paired-end reads were counted. a combination of negative binomial distribution, Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests were used to determine whether statistically significant differences existed between counts of one group of sequencing data to another through tMM normalized count matrices, controlling for sequencing depth between samples and datasets. Function and pathway enrichment analysis was performed with Metascape and enrichment of gene ontology (GO) was performed with Gene set enrichment analysis (Gsea) [33,34].

Statistical analysis
Baseline clinical and sociodemographic characteristics were compared between local recurrent and non-recurrent scca. statistical comparisons with p < 0.05 were considered significant. cancer staging was also summarized using the american cancer society's scca stage criteria [35]. Pca (Principle component analysis) was used to examine the variance in overall expression patterns in each sample in a linear combination and the uncorrelated factors separated the orthogonal components. DeGs were considered upregulated or downregulated if (log 2 fold change (log 2 Fc))>2 and (log 2 Fc) >-2, respectively and p-value < 0.05 were found between the recurrent and non-recurrent groups. edgeR automatically adjusts p-values generated for each individual gene for multiple comparisons as shown in the user guide [32]. to examine various correlations of the transcripts, Volcano Plots and bubble plots were generated in R; heatmaps, and bar graphs as well as other visualizations were generated using a combination of R, Metascape, cytoscape, Gsea, and PathwayViz [33,34,36,37].
the study was approved by the Office of the institutional Review Board of human Use at UaB (iRB-300000131). it fully adhered to the Declaration of helsinki. table 1 shows the demographics of the PlWh studied in the study. study participants included 3 lR and 9 cR anal cancer patients, with male gender only. all participants underwent chemoradiation therapy as the initial primary treatment. Race/ethnicity, sexual risks (MsM), and hiV-related clinical indicators cD4 t counts at different time points were not statistically significant between recurrent and non-recurrent samples (table 1).

Principal component analysis (PCA)
Following gene-expression normalization, initial Pca indicated that a subset of all except one non-recurrent clustered and another subset of three recurrent clustered (Figure 1 a). however, a subsequent permutation analysis of variance analysis (PeRMaNOVa) confirmed that recurrence was not a significant factor governing the overall variance of samples within the Pca (p > 0.05).

Identification of DEGs between recurrent and non-recurrent samples
there were 449 DeGs (390 mRNa, 12 miRNa, 17 lin-cRNa and 18 snRNa) observed when comparing the recurrent to the non-recurrent group (Figures 1(B)). two genes, ACAP3 and MRPS11 passed the false discovery rate (FDR) <0.05. Volcano plot (Figure 1(c)), indicated all the differentially expressed mRNas that were statistically significant in the two groups. heatmaps of specific miRNas, lincRNas and snRNas that were differentially expressed are shown in Figures  1(D, e, F), respectively.

GO Functional annotation for DEGs using Metascape
Functional enrichment analysis with Metascape found that DeGs between recurrent and non-recurrent were significantly enriched in antigen processing and presentation (-logP = 3.0), allograft rejection (-logP = 7.31), GPCR signaling (-logP = 4.21), and IFNG production (-logP = 3.76) (Figure 2(a)). Figure 2(B) shows results from Gsea in recurrent and non-recurrent scca and the number of genes in each and the -logP-value. Of all GO terms, only 3 were shared between groups, the regulation of peptidase activity (-logP = 3.99 or 3.48 in recurrent or non-recurrent tissues respectively), defense response (-log p = 2.01 or 2.85 in recurrent or non-recurrent tissues respectively), and staphylococcus aureus infection (-logP = 3.11 or 4.69 in recurrent or non-recurrent tissues respectively), though the specific genes enriching to these terms were not (Figure 2(B)). a number of genes involved in 'allograft rejection' also contributed to the 'immune system process' term, as the same immunologic processes involved in rejecting non-self allografts may overlap with cancer rejection (Figure  2(c)) [38]. DeGs enriched in the recurrent tissues were mainly involved in epidermis development (-logP = 9.54) (Figure 2(D)). in the 'epidermis development' term, a number of keratin genes were included such as KRT1, KRT10, KRT12, and KRT20, as well as hedgehog signaling pathway transmembrane receptor gene PTCHptch2 (Figure 2(D)).

Discussion
chemoradiation therapy (cRt) has emerged as the standard of care for anal cancer; however, data have shown treatment is ineffective in up to 40% of the treated patients, with recurrence in the same anatomical sites in 10-14% of patients. Given the significant gap in knowledge as to why treatment response differs, our study revealed many DeGs (coding and non-coding transcriptomics) that could potentially: a) help understand the biological mechanism; and b) serve as clinical biomarkers for recurrence of anal cancer post-therapy. Overall, we observed signatures of productive immune responses or epidermal development in non-recurrent or recurrent anal cancer isolates respectively. Very little has been published regarding the two genes (ACAP3 and MRPS11), which passed the FDR< 0.05 in our study. a previous study suggested that downregulated ACAP3 was associated with ferroptosis, a type of cell death, in thyroid papillary carcinoma and implied a novel direction for cancer therapy [39]. another study found extremely down-regulated ACAP3 in patients with late-stage liver cancer compared to the early-stage [40]. MRPS11 appeared to be a robust prognostic indicator in uveal melanoma; up-regulated MRPS11 was associated with worse overall survival [41]. therefore, upregulation of these two genes may serve as a biomarker of likely recurrence. those genes contributing to 'defense response' and 'peptidase activity' terms in recurrent tissues were mainly composed of a number of genes previously associated with cancer such as S100A8, S100A9, TANK, or ADORA2B [42][43][44][45]. conversely, there were a number of productive immune genes contributing to 'defense response' in non-recurrent anal cancer isolates such as IL4, HLA-A, CD200R1, and CD96. CD96 is commonly expressed on the surface of cD8+ t cells and NK cells and has been implicated as an immune checkpoint inhibitor capable of suppressing iFNG production as well as cytotoxicity and worsening patient prognosis in cancer [46,47]. For the non-recurrent isolates a number of integrins (ICAM2) and collagen proteins (COL4A3, COL6A3) contributed to the 'regulation of peptidase activity' term. additionally, human leukocyte antigen (hla) components from both Mhc class ia and ii essential to antigen presentation and presentation were upregulated only in the non-recurrent samples including HLA-A, HLA-C and Mhc-ii genes such as HLA-DRB5 and HLA-DQA1. hla class i molecules are critical to the anti-cancer immune response, as has been shown previously, mediated primarily through recognition and lysis by cytotoxic t lymphocytes (ctls) [48]. Previous meta-analysis reported that patients with overexpressed hla class i molecules contributed to better overall survival. it may imply that a high expression level of hla-a and hla-c in initial non-recurrent anal cancer isolates may thereby result in better chances of tumor-free survival.
Of the miRNas differentially expressed ( Figure 1D), miR-4316 seems to indicate some biological significance in other cancers. it is significantly downregulated in gastric tumor tissues and cell lines compared to Figure 2. Gene ontology (Go) enrichment for deGs upregulated in recurrent or non-recurrent anal cancer isolates. (A) Metascape enrichment bubblemap of Go terms describing deGs in each group, with relative number of deGs from each group contributing to each term/node represented as individual pie charts, colored by membership. (B) number of Go terms for deGs in each group represented as venn diagram and most relevant biological processes represented as a bubbleplot colored by Abs(log(p-value)) and size constrained by number of deGs contributing to term and ordered by Abs(logP). Representative heatmaps of deGs contributing to the Go terms with the highest enrichment score for each group are shown for (c) non-recurrent (Allograft Rejection) and (d) recurrent groups (epidermis development), colored by reads per kilobase exon per million mapped reads (RPKM) expression value for each deG.
healthy tissues [49]. the study suggested that miR-4316 inhibited tumor proliferation and migration by repressing vascular endothelial growth factor a (VeGF-a) [49]. Downregulation of miR-4316 in bladder cancer tissues elevates expression of ZBTB2 [50]. ZBTB2 promotes tumorigenesis by repressing p21 and facilitating PDK4 transcription [50]. in breast cancer, miR-4316 is inversely associated with an over-expressed circular RNa, circMYO9B in tumor tissues [51]. in colorectal cancer, miR-4316 was over-expressed compared to healthy tissues, suggesting it could be a potential biomarker for early colorectal cancer diagnosis [52].
similarly to the lincRNas differentially expressed in our study, sNORa71B and -sOX21-as1 have been indicated in the pathogenesis of other cancers. lincRNa-sNORa71B promotes the migration of breast cancer cells across the blood-brain barrier via epithelial-mesenchymal transition (eMt), leading to brain metastasis from initial breast cancer [53,54]. in prostate cancer, the knockdown of sNORa71B reduces tumor cell proliferation, invasion, migration, and eMt [55]. SOX21-AS1 plays a pivotal role in multiple cancers, including, lung, cervical, breast, oral, and colorectal cancers [56][57][58][59] and is involved in increased tumor proliferation, migration, and invasion in tumor tissues [56][57][58][59]. the lincRNa sOX21-as1, as well as SOX21, SOX2, and PIK3CA (encoding a subunit of Pi3K), were found to be highly expressed in the recurrent tissues of this study (Figure 1e). a number of studies showed that mutations within PIK3CA gene were found in multiple hPV-induced squamous cell cancers, such as head and neck cancers, cervical, and anal cancers [18][19][20][21][22][23]. sOX21-as1's target is sOX21, which regulates SOX2 and ultimately the PI3K pathway [60]. PI3K and SOX2 have previously been identified as regulators of the 'stemmness' of related head and neck squamous cancer stem cells and may thereby be acting to promote scca recurrence through a similar mechanism [61].
Non-recurrent tissues, representing those neoplasms that were isolated before any recurrence, may be seen as the 'initial' onset of anal cancer. in comparison, therefore, those recurrent isolates represent a successive re-emergence of these tumors after initial removal and treatment, and therefore can be treated as those that were refractory to treatment. in these isolates, a number of genes were identified as upregulated over initial non-recurrent samples, mainly involved in epidermal development such as a number of keratin-coding genes (KRT1, 10,12,20), hedgehog signaling (PTCH2), and genes previously identified as contributing to the proliferation of a diverse array of other cancers (S100A8, A9, TANK, ADORA2B) [42][43][44][45][62][63][64]. Many of these DeGs were also associated with water/lipid homeostasis, which is consistent with disruption of the epidermal barrier maintained by keratinocytes, melanocytes, and other epidermal stem cells typically responsible for protection against water loss [65].
several limitations of the study were noted. in the study, we only analyzed tissue samples from 9 non-recurrent patients and 3 recurrent patients; thus, the power for statistical analysis is limited; however, it is one of the first studies to compare the complete transcriptional profile of rare cancer based on treatment outcome in rare cancer. While the current findings need to be replicated and validated in a larger patient population, this study provides some preliminary insights into the biology of scca that can be followed up in the future. the staging of scca is not uniform in all the patients; however, there was heterogeneity in both groups and no differences statistically, so no major bias is expected in the analysis due to staging status. in addition, the study was based on ehR data from one hospital, which means other clinical data of the patients, such as treatment or outcomes outside of UaB healthcare system may be missed. however, the study participants included in the study were diagnosed, treated, and followed up at the same UaB healthcare system, reducing some of these limitations in this study.
Overall, this study suggests that a complex immune-regulatory network may be acting within initial non-recurrent anal cancer isolates which is disrupted upon recurrence. a number of productive immune signatures were identified as being upregulated within initial non-recurrent isolates alongside several immune checkpoint molecules that may play a role in the recurrence of subsequent anal cancer post-treatment. cD4+ t cell function and infiltration into the sequenced non-recurrent anal cancer isolates, specifically, seem to be the main immune mediators that disappear in recurrence. these results suggest that the absence of transcriptional signatures of productive cD4+ t cell activation and subsequent humoral responses are the main immune components that contribute to the recurrence of anal cancer. additionally, this study has identified several potential transcriptional signatures that may prove useful as biomarkers to predict the recurrence of anal cancer post-resection of a tumor in patients.

Author contributions
YY and ss were responsible for the study conception and design; YY and aDJ contributed to acquisition of the data; YY, KJM, and hWW conducted the analysis of the data and aDJ assisted; aB, sls and ss helped ensure the accuracy of the analysis and interpretation of the data; YY and KJM drafted the manuscript and ss finalized the overall draft; OaM prepared and extracted RNa from FFPe tissues and assisted with data collection; MK, saD and GaB assisted with acquisition of clinical data and interpretation of data; all coauthors reviewed, edited and approved the final version of the manuscript before submitting for publication.

Disclosure statement
No conflicts of interest were declared for any of the authors.
Funding this research was supported by the tissue Procurement shared Resource of the O'Neal comprehensive cancer center (P30ca013148)/UaB-tissue Biorepository (UaB-tBR). this work was also supported by the Quetelet endowed Professorship Research Fund (ss). the UaB centers for aiDs Research supported YY for the anal cancer research (the 2018 World aids Day Poster award). sls is supported by the National cancer institute (K07 ca225404).

Data availability statement
the data that support the findings of this study are available from the corresponding author ss, upon reasonable request.